Goto

Collaborating Authors

 excitation module


An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval

Jing, Xiaolun, Yang, Genke, Chu, Jian

arXiv.org Artificial Intelligence

CLIP4Clip model transferred from the CLIP has been the de-factor standard to solve the video clip retrieval task from frame-level input, triggering the surge of CLIP4Clip-based models in the video-text retrieval domain. In this work, we rethink the inherent limitation of widely-used mean pooling operation in the frame features aggregation and investigate the adaptions of excitation and aggregation design for discriminative video representation generation. We present a novel excitationand-aggregation design, including (1) The excitation module is available for capturing non-mutuallyexclusive relationships among frame features and achieving frame-wise features recalibration, and (2) The aggregation module is applied to learn exclusiveness used for frame representations aggregation. Similarly, we employ the cascade of sequential module and aggregation design to generate discriminative video representation in the sequential type. Besides, we adopt the excitation design in the tight type to obtain representative frame features for multi-modal interaction. The proposed modules are evaluated on three benchmark datasets of MSR-VTT, ActivityNet and DiDeMo, achieving MSR-VTT (43.9 R@1), ActivityNet (44.1 R@1) and DiDeMo (31.0 R@1). They outperform the CLIP4Clip results by +1.2% (+0.5%), +4.5% (+1.9%) and +9.5% (+2.7%) relative (absolute) improvements, demonstrating the superiority of our proposed excitation and aggregation designs. We hope our work will serve as an alternative for frame representations aggregation and facilitate future research.


Identification of morphological fingerprint in perinatal brains using quasi-conformal mapping and contrastive learning

Wang, Boyang, Zheng, Weihao, Wang, Ying, Zhang, Zhe, Sheng, Yuchen, Wang, Minmin

arXiv.org Artificial Intelligence

The morphological fingerprint in the brain is capable of identifying the uniqueness of an individual. However, whether such individual patterns are present in perinatal brains, and which morphological attributes or cortical regions better characterize the individual differences of ne-onates remain unclear. In this study, we proposed a deep learning framework that projected three-dimensional spherical meshes of three morphological features (i.e., cortical thickness, mean curvature, and sulcal depth) onto two-dimensional planes through quasi-conformal mapping, and employed the ResNet18 and contrastive learning for individual identification. We used the cross-sectional structural MRI data of 682 infants, incorporating with data augmentation, to train the model and fine-tuned the parameters based on 60 infants who had longitudinal scans. The model was validated on 30 longitudinal scanned infant data, and remarkable Top1 and Top5 accuracies of 71.37% and 84.10% were achieved, respectively. The sensorimotor and visual cortices were recognized as the most contributive regions in individual identification. Moreover, the folding morphology demonstrated greater discriminative capability than the cortical thickness, which could serve as the morphological fingerprint in perinatal brains. These findings provided evidence for the emergence of morphological fingerprints in the brain at the beginning of the third trimester, which may hold promising implications for understanding the formation of in-dividual uniqueness in the brain during early development.


EEG motor imagery decoding: A framework for comparative analysis with channel attention mechanisms

Wimpff, Martin, Gizzi, Leonardo, Zerfowski, Jan, Yang, Bin

arXiv.org Artificial Intelligence

Objective The objective of this study is to investigate the application of various channel attention mechanisms within the domain of brain-computer interface (BCI) for motor imagery decoding. Channel attention mechanisms can be seen as a powerful evolution of spatial filters traditionally used for motor imagery decoding. This study systematically compares such mechanisms by integrating them into a lightweight architecture framework to evaluate their impact. Approach We carefully construct a straightforward and lightweight baseline architecture designed to seamlessly integrate different channel attention mechanisms. This approach is contrary to previous works which only investigate one attention mechanism and usually build a very complex, sometimes nested architecture. Our framework allows us to evaluate and compare the impact of different attention mechanisms under the same circumstances. The easy integration of different channel attention mechanisms as well as the low computational complexity enables us to conduct a wide range of experiments on three datasets to thoroughly assess the effectiveness of the baseline model and the attention mechanisms. Results Our experiments demonstrate the strength and generalizability of our architecture framework as well as how channel attention mechanisms can improve the performance while maintaining the small memory footprint and low computational complexity of our baseline architecture. Significance Our architecture emphasizes simplicity, offering easy integration of channel attention mechanisms, while maintaining a high degree of generalizability across datasets, making it a versatile and efficient solution for EEG motor imagery decoding within brain-computer interfaces.


Squeeze aggregated excitation network

N, Mahendran

arXiv.org Artificial Intelligence

Convolutional neural networks have spatial representations which read patterns in the vision tasks. Squeeze and excitation links the channel wise representations by explicitly modeling on channel level. Multi layer perceptrons learn global representations and in most of the models it is used often at the end after all convolutional layers to gather all the information learned before classification. We propose a method of inducing the global representations within channels to have better performance of the model. We propose SaEnet, Squeeze aggregated excitation network, for learning global channelwise representation in between layers. The proposed module takes advantage of passing important information after squeeze by having aggregated excitation before regaining its shape. We also introduce a new idea of having a multibranch linear(dense) layer in the network. This learns global representations from the condensed information which enhances the representational power of the network. The proposed module have undergone extensive experiments by using Imagenet and CIFAR100 datasets and compared with closely related architectures. The analyzes results that proposed models outputs are comparable and in some cases better than existing state of the art architectures.


Variations of Squeeze and Excitation networks

NV, Mahendran

arXiv.org Artificial Intelligence

Convolutional neural networks learns spatial features and are heavily interlinked within kernels. The SE module have broken the traditional route of neural networks passing the entire result to next layer. Instead SE only passes important features to be learned with its squeeze and excitation (SE) module. We propose variations of the SE module which improvises the process of squeeze and excitation and enhances the performance. The proposed squeezing or exciting the layer makes it possible for having a smooth transition of layer weights. These proposed variations also retain the characteristics of SE module. The experimented results are carried out on residual networks and the results are tabulated.